266 research outputs found

    Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts

    Full text link
    Traditional model-based reinforcement learning (RL) methods generate forward rollout traces using the learnt dynamics model to reduce interactions with the real environment. The recent model-based RL method considers the way to learn a backward model that specifies the conditional probability of the previous state given the previous action and the current state to additionally generate backward rollout trajectories. However, in this type of model-based method, the samples derived from backward rollouts and those from forward rollouts are simply aggregated together to optimize the policy via the model-free RL algorithm, which may decrease both the sample efficiency and the convergence rate. This is because such an approach ignores the fact that backward rollout traces are often generated starting from some high-value states and are certainly more instructive for the agent to improve the behavior. In this paper, we propose the backward imitation and forward reinforcement learning (BIFRL) framework where the agent treats backward rollout traces as expert demonstrations for the imitation of excellent behaviors, and then collects forward rollout transitions for policy reinforcement. Consequently, BIFRL empowers the agent to both reach to and explore from high-value states in a more efficient manner, and further reduces the real interactions, making it potentially more suitable for real-robot learning. Moreover, a value-regularized generative adversarial network is introduced to augment the valuable states which are infrequently received by the agent. Theoretically, we provide the condition where BIFRL is superior to the baseline methods. Experimentally, we demonstrate that BIFRL acquires the better sample efficiency and produces the competitive asymptotic performance on various MuJoCo locomotion tasks compared against state-of-the-art model-based methods.Comment: Accepted by IROS202

    The global solution of the minimal surface flow and translating surfaces

    Full text link
    In this paper, we study evolved surfaces over convex planar domains which are evolving by the minimal surface flow ut=div(Du1+∣Du∣2)−H(x,Du).u_{t}= div\left(\frac{Du}{\sqrt{1+|Du|^2}}\right)-H(x,Du). Here, we specify the angle of contact of the evolved surface to the boundary cylinder. The interesting question is to find translating solitons of the form u(x,t)=ωt+w(x)u(x,t)=\omega t+w(x) where ω∈R\omega\in \mathbb R. Under an angle condition, we can prove the a priori estimate holds true for the translating solitons (i.e., translator), which makes the solitons exist. We can prove for suitable condition on H(x,p)H(x,p) that there is the global solution of the minimal surface flow. Then we show, provided the soliton exists, that the global solutions converge to some translator.Comment: 16 page

    Adjustable Robust Reinforcement Learning for Online 3D Bin Packing

    Full text link
    Designing effective policies for the online 3D bin packing problem (3D-BPP) has been a long-standing challenge, primarily due to the unpredictable nature of incoming box sequences and stringent physical constraints. While current deep reinforcement learning (DRL) methods for online 3D-BPP have shown promising results in optimizing average performance over an underlying box sequence distribution, they often fail in real-world settings where some worst-case scenarios can materialize. Standard robust DRL algorithms tend to overly prioritize optimizing the worst-case performance at the expense of performance under normal problem instance distribution. To address these issues, we first introduce a permutation-based attacker to investigate the practical robustness of both DRL-based and heuristic methods proposed for solving online 3D-BPP. Then, we propose an adjustable robust reinforcement learning (AR2L) framework that allows efficient adjustment of robustness weights to achieve the desired balance of the policy's performance in average and worst-case environments. Specifically, we formulate the objective function as a weighted sum of expected and worst-case returns, and derive the lower performance bound by relating to the return under a mixture dynamics. To realize this lower bound, we adopt an iterative procedure that searches for the associated mixture dynamics and improves the corresponding policy. We integrate this procedure into two popular robust adversarial algorithms to develop the exact and approximate AR2L algorithms. Experiments demonstrate that AR2L is versatile in the sense that it improves policy robustness while maintaining an acceptable level of performance for the nominal case.Comment: Accepted to NeurIPS202

    Laxity-Aware Scalable Reinforcement Learning for HVAC Control

    Full text link
    Demand flexibility plays a vital role in maintaining grid balance, reducing peak demand, and saving customers' energy bills. Given their highly shiftable load and significant contribution to a building's energy consumption, Heating, Ventilation, and Air Conditioning (HVAC) systems can provide valuable demand flexibility to the power systems by adjusting their energy consumption in response to electricity price and power system needs. To exploit this flexibility in both operation time and power, it is imperative to accurately model and aggregate the load flexibility of a large population of HVAC systems as well as designing effective control algorithms. In this paper, we tackle the curse of dimensionality issue in modeling and control by utilizing the concept of laxity to quantify the emergency level of each HVAC operation request. We further propose a two-level approach to address energy optimization for a large population of HVAC systems. The lower level involves an aggregator to aggregate HVAC load laxity information and use least-laxity-first (LLF) rule to allocate real-time power for individual HVAC systems based on the controller's total power. Due to the complex and uncertain nature of HVAC systems, we leverage a reinforcement learning (RL)-based controller to schedule the total power based on the aggregated laxity information and electricity price. We evaluate the temperature control and energy cost saving performance of a large-scale group of HVAC systems in both single-zone and multi-zone scenarios, under varying climate and electricity market conditions. The experiment results indicate that proposed approach outperforms the centralized methods in the majority of test scenarios, and performs comparably to model-based method in some scenarios.Comment: In Submissio

    On the Jets Induced by a Cavitation Bubble Near a Cylinder

    Full text link
    The dynamics of cavitation bubbles in the vicinity of a solid cylinder or fibre are seen in water treatment, demolition and/or cleaning of composite materials, as well as bio-medical scenarios such as ultrasound-induced bubbles near the tubular structures in the body. When the bubble collapses near the surface, violent fluid jets may be generated. Understanding whether these jets occur and predicting their directions -- departing or approaching the solid surface -- is crucial for assessing their potential impact on the solid phase. However, the criteria for classifying the onset and directions of the jets created by cavitation near a curved surface of a cylinder have not been established. In this research, we present models to predict the occurrence and directions of the jet in such scenarios. The onset criteria and the direction(s) of the jets are dictated by the bubble stand-off distance and the cylinder diameter. Our models are validated by comprehensive experiments. The results not only predict the jetting behaviour but can serve as guidelines for designing and controlling the jets when a cavitation bubble collapses near a cylinder, whether for protective or destructive purposes

    Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations

    Full text link
    Inverse reinforcement learning (IRL) aims to explicitly infer an underlying reward function based on collected expert demonstrations. Considering that obtaining expert demonstrations can be costly, the focus of current IRL techniques is on learning a better-than-demonstrator policy using a reward function derived from sub-optimal demonstrations. However, existing IRL algorithms primarily tackle the challenge of trajectory ranking ambiguity when learning the reward function. They overlook the crucial role of considering the degree of difference between trajectories in terms of their returns, which is essential for further removing reward ambiguity. Additionally, it is important to note that the reward of a single transition is heavily influenced by the context information within the trajectory. To address these issues, we introduce the Distance-rank Aware Sequential Reward Learning (DRASRL) framework. Unlike existing approaches, DRASRL takes into account both the ranking of trajectories and the degrees of dissimilarity between them to collaboratively eliminate reward ambiguity when learning a sequence of contextually informed reward signals. Specifically, we leverage the distance between policies, from which the trajectories are generated, as a measure to quantify the degree of differences between traces. This distance-aware information is then used to infer embeddings in the representation space for reward learning, employing the contrastive learning technique. Meanwhile, we integrate the pairwise ranking loss function to incorporate ranking information into the latent features. Moreover, we resort to the Transformer architecture to capture the contextual dependencies within the trajectories in the latent space, leading to more accurate reward estimation. Through extensive experimentation, our DRASRL framework demonstrates significant performance improvements over previous SOTA methods

    Structural analysis of a novel rabbit monoclonal antibody R53 targeting an epitope in HIV-1 gp120 C4 region critical for receptor and co-receptor binding

    Get PDF
    The fourth conserved region (C4) in the HIV-1 envelope glycoprotein (Env) gp120 is a structural element that is important for its function, as it binds to both the receptor CD4 and the co-receptor CCR5/CXCR4. It has long been known that this region is highly immunogenic and that it harbors B-cell as well as T-cell epitopes. It is the target of a number of antibodies in animal studies, which are called CD4-blockers. However, the mechanism by which the virus shields itself from such antibody responses is not known. Here, we determined the crystal structure of R53 in complex with its epitope peptide using a novel anti-C4 rabbit monoclonal antibody R53. Our data show that although the epitope of R53 covers a highly conserved sequence (433)AMYAPPI(439), it is in the gp120 trimer and in the CD4-bound conformation. Our results suggest a masking mechanism to explain how HIV-1 protects this critical region from the human immune system

    Trace element zinc and skin disorders

    Get PDF
    Zinc is a necessary trace element and an important constituent of proteins and other biological molecules. It has many biological functions, including antioxidant, skin and mucous membrane integrity maintenance, and the promotion of various enzymatic and transcriptional responses. The skin contains the third most zinc in the organism. Zinc deficiency can lead to a range of skin diseases. Except for acrodermatitis enteropathic, a rare genetic zinc deficiency, it has also been reported in other diseases. In recent years, zinc supplementation has been widely used for various skin conditions, including infectious diseases (viral warts, genital herpes, cutaneous leishmaniasis, leprosy), inflammatory diseases (hidradenitis suppurativa, acne vulgaris, rosacea, eczematous dermatitis, seborrheic dermatitis, psoriasis, Behcet's disease, oral lichen planus), pigmentary diseases (vitiligo, melasma), tumor-associated diseases (basal cell carcinoma), endocrine and metabolic diseases (necrolytic migratory erythema, necrolytic acral erythema), hair diseases (alopecia), and so on. We reviewed the literature on zinc application in dermatology to provide references for better use

    Rabbit anti-HIV-1 monoclonal antibodies raised by immunization can mimic the antigen-binding modes of antibodies derived from HIV-1-infected humans

    Get PDF
    The rabbit is a commonly used animal model in studying antibody responses in HIV/AIDS vaccine development. However, no rabbit monoclonal antibodies (MAbs) have been developed previously to study the epitope-specific antibody responses against HIV-1 envelope (Env) glycoproteins, and little is known about how the rabbit immune system can mimic the human immune system in eliciting such antibodies. Here we present structural analyses of two rabbit MAbs, R56 and R20, against the third variable region (V3) of HIV-1 gp120. R56 recognizes the well-studied immunogenic region in the V3 crown, while R20 targets a less-studied region at the C terminus of V3. By comparison of the Fab/epitope complex structures of these two antibodies raised by immunization with that of the corresponding human antibodies derived from patients chronically infected with HIV-1, we found that rabbit antibodies can recognize immunogenic regions of gp120 and mimic the binding modes of human antibodies. This result can provide new insight into the use of the rabbit as an animal model in AIDS vaccine development
    • …
    corecore